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NOTE  ON  A  NEW  COMPUTATIONAL  DATA  SMOOTHING  PROCEDURE 


SUGGESTED  BY  MINIMUM  MEAN  SQUARE  ERROR  ESTIMATION 

* 

P .  Swer ] ing 

Consultant  to  The  RAND  Corporation,  Santa  Monica,  California 

I  .  INTRODUCTION 

Reference  1  contains  a  discussion  of  some  possible  computational 
procedures  for  finding  the  exact  minimum  mean  square  error  (MMSE) 
estimates  of  parameters  in  the  case  where  the  observations  depend  non¬ 
linear  ly  on  the  unknown  paiameters.  The  procedures  analyzed  there  are 
for  the  case  where  the  observation  noise  is  Gaussian  and  additive. 

It  was  pointed  out  in  that  paper  that  the  applicability  of  such 
estimation  procedures,  that  is,  those  that  would  yield  MMSE  estimates 
if  the  noise  were  Gaussian  and  additive,  is  not  necessarily  limited 
to  cases  where  the  noise  is  actually  Gaussian  or  additive.  An  analogy 
was  made  to  least  squares  procedures,  which  would  be  maximum  likelihood 
if  the  noise  were  additive  and  Gaussian,  but  whose  applicability  is 
not  restricted  to  such  cases  (although  if  these  conditions  are  not 
satisfied  one  can  no  longer  necessarily  make  the  same  claims  regarding 
optimality)  . 

•k 

Any  views  expressed  in  this  paper  are  those  of  the  author.  They 
should  not  be  interpreted  as  reflecting  the  views  of  The  RAND  Corporation 
or  the  official  opinion  or  policy  of  any  of  its  governmental  or  private 
research  sponsors.  Papers  are  reproduced  by  The  RAND  Corporation  as  a 
courtesy  to  members  of  its  staff. 


In  contrast  to  lrast  squares  procedures,  the  MMSE  procedures 
achieve  exact  (not  merely  asymptotic)  minimum  mean  square  estimation 
error  even  when  the  observational  data  depend  non-linearly  on  the 
unknown  parameters. 

Also,  the  computational  procedures  are  quite  different.  In  the 
least  squares  procedures,  not  only  is  the  optimality  criterion  the 
minimization  of  the  statistically  expected  square  of  the  error,  but 
the  computational  procedure  involves  minimizing,  with  respect  to  the 
parameters,  a  function  of  both  the  observed  data  and  the  parameters; 
and  this  function  is  a  weighted  sum  of  squares,  or  more  generally 
a  positive  quadratic  form,  in  the  residuals. 

In  contrast,  while  the  optimality  criterion  for  the  MMSE  pro¬ 
cedures  is  still  the  minimization  of  the  statistically  expected 
square  error,  the  computational  procedures  do  not  necessarily  involve 
the  minimization  of  a  function  of  observed  data  and  parameters;  rather, 
they  involve  the  evaluation  of  certain  integrals  over  the  parameter 
space.  Although  the  computational  implementation  of  MMSE  estimation 
is  often  very  difficult,  there  may  be  cases  in  which  the  evaluation  of 
the  necessary  integrals  is  more  convenient,  even  from  the  computational 
point  of  view,  than  the  minimization  required  in  least  squares  pro¬ 
cedures  . 

In  the  "strong  signal"  case  the  least  squares  procedure  can  be 
linearized;  that  is,  the  functional  dependence  of  the  observed  data 
on  the  unknown  parameters  can  be  treated  as  linear  within  some  region 
surrounding  the  true  parameter  values,  and  other  regions  of  the  para- 


3 


meter  space  can  be  effectively  ignored.  In  such  cases,  the  estimates 
obtained  are  asymptotically  the  same  as  those  obtained  from  MMSE  pi  o- 
cedures . 

More  generally,  if  the  logarithm  of  the  likelihood  function  is 
convex  in  the  unknown  parameters  in  a  region  surrounding  the  true 
values,  for  sufficiently  strong  signal  everything  other  than  this 
region  may  be  ignored,  and  iterative  techniques  may  be  applied  to  find 
the  least  squares  estimate. 

The  purpose  of  this  note  (following  a  suggestion  made  in  Ref.  1) 
is  to  investigate  the  form  taken  by  MMSE  estimation  procedures  in  the 
case  where  the  dependence  of  the  observed  data  on  the  parameters  can, 
with  good  approximation,  be  considered  to  contain  only  constant, 
linear,  and  quadratic  terms.  The  resulting  procedures  can  be  con¬ 
sidered  as  alternatives  to  the  iterative  procedures  used  in  least 


squares. 


II.  MMSE  ESTIMATES  WITH  ADDITIVE  GAUSSIAN  NOISE 


It  is  assumed  that  the  observed  data  can  be  represented  in  the 

form 

S(t)  =  f(Xj,  ....  xn,  t)  +  e(t) 

where  t  is  a  time  parameter  (actually,  it  need  only  be  assumed  that 
t  is  a  parameter  whose  value  is  known--it  need  not  be  interpreted 
as  time,  but  it  is  convenient  to  speak  of  it  in  this  way); 

Xf,  •••,  are  n  real,  unknown  parameters;  and  {fi(t)}  is  the  observa¬ 
tion  uror. 

We  will  write 

x  =  (x: ,  • • • ,  xn) 
x  e  X 

where  X  is  a  parameter  set,  and  n  is  the  number  of  unknown  real 
parame  ters . 

The  parameter  vector  x  is  assumed  to  have  an  a-priori  proba¬ 
bility  density  function  p (x)  over  X. 

A1  so,  it  will  be  assumed  initially  for  purposes  of  derivation 
that  the  observed  data  consist  of  samples  of  S(t)  at  a  number  N  of 
discrete  time  points  j^t  ^j-,  p,  =  1,  .  .  .  ,  N,  so  that  the  observed  data 
are 

«  S(t^)  »  f(x,  t^)  +  £(t^) ,  p,  “  1,  N 

Eventually,  however,  the  results  will  also  be  obtained,  by  a  limiting 
process,  for  cases  where  the  function  S(t)  is  continuously  observed 


over  some  interval. 
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Suppose  the  set  of  random  variables  [£  )  are  Gaussian,  have  zero 

u 

means,  and  have  covariance  matrix 


where 


£  £  =  cp  =  cp  ( t  ,  t  ) 

M-  LL  '■> 


cp(s,  t)  =  £(s)  £(t) 


(4) 

(5) 


Tlien,  the  joint  probability  density  of  [s  },  conditional  on  the 

M- 

parameters  having  given  values,  is 


-N 

P(S  |  x)  =  (2tt)  2  a"*  exp  i 


N 


l  ‘  L  V  [Su  ‘  f(x>  V][Sv  ‘  f(x'  VJj  <6> 

PL,U=1 


where 


S  -  (sv  .. .,  sN) 

x  =  (x  ,  •  •  ■  ,  x  ) 
i  n 


A  =  det 


(7) 


T1 


UU 


(  _1) 


If  the  a-priori  probability  density  of  x  over  X  is  p(x),  tnen 
the  joint  p.d.f.  of  S  and  x  is 

p(S,  x)  =  p  (S  |  x)  p  (x )  (8) 


with  p (S |  x)  given  by  Eq .  (6).  (These  various  density  functions  do 
not  necessarily  have  the  same  functional  form,  even  though  the  symbol 
p  is  used  in  each  case;  ambiguity  is  removed  by  noting  the  argument 
of  p . ) 

Let  0(x)  =  i3(Xj,  ...,  x^)  be  any  real-valued  function  of 
x i ,  ...,  x^  having  finite  second  moment  with  respect  to  p(x). 
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Then,  the  MMSE  estimate  of  p(x),  denoted  by  g,  is  that  function 
of  S  which  minimizes 

I*  J  [F(S)  -  9(x)]2  p(S,  x)  dS  dx 
Q  X 

for  all  functions  F(S)  of  S.  Here,  0  is  the  set  consisting  of  the 
possible  values  of  the  vector  S. 

It  is  then  easy  to  show  that  the  MMSE  estimate  B  is*'^ 

3 (x)  p(x)  exp  (-^  A(x)  +  u(x)}  dx 

&  -  j -  (9) 

p(x)  exp  {-if  A(x)  +  u(x)}  dx 
X 

where  V7 

N 

A (x)  «=  Y  Tj  f(x,  t  )  f(x,  t  )  (10) 

L-i  U 

^=1 

N 

u  (x)  =  V  T|  f  (x,  t  )  S(t  )  (11) 

L  P  U 

Vl,°“l 

If  the  observational  data  consists  of  the  function  S(t)  observed 
continuously  over  an  interval  (T^ ,  T2),  then  0  is  given  by  Eq .  (9) 
but  with  A(x)  and  u(x)  In  Eqs.  (10)  and  (11)  replaced  by  their  limit¬ 
ing  values  as  N  ®  and  as  the  points  [t  }  become  dense  In  (T^,  T^)  • 

Methods  of  evaluating  such  limits  are  discussed  at  length  in 
Refs.  (2)  -  (4).  If  might  be  noted  that  A(x)  is  essentially  a  kind 
of  signal- to-noise  ratio  appropriate  to  noise  having  the  covariance 
function  (p(s,  t),  while  u(x)  is  a  linear  operation  on  the  received 
signal  which  is  equivalent  to  that  generalization  of  cross-correlation 
which  is  appropriate  to  noise  having  the  covariance  function  cp(s>  0  • 
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In  fact,  in  the  white-noise  case,  in  the  limit 


T2 

2  \  r2 


A  (x)  =  —  J  f  (x,  t)  dt 


u(x)  =  —  1  f (x,  t)  S(t)  dt 

N  j 

°T. 

where  N  =  one-sided  noise  spectral  density, 
o 

Now,  we  will  consider  the  expansion  of  both  u(x)  and  A(x)  through 
second  order  in  [x  ].  In  essence,  we  will  expand  f(x,  t)  through 
second  order  in  { x ^ }  for  purposes  of  approximating  u(x).  However,  in 
approximating  A(x),  the  full  expansion  of  f(x,  t)  to  second  order 
will  not  be  used;  rather,  A(x)  will  itself  be  expanded  through  at 
most  second  order. 

This  can  be  expected  to  yield  very  accurate  approximations  to 
A(x)  in  many  cases  of  interest,  since  there  are  many  such  cases  in 
which  A(x)  is  either  independent  of  x  or  depends  on  x  in  a  way  which 
car  be  well  approximated  by  retaining  only  terms  through  quadratic. 

Thus,  suppose  x  “  (>:.^,  ...,  x^)  is  some  definite  value  of  x  and 
that  in  the  neighborhood  of  x 


t)  -  f(x,  t)  +  £ 


(xi  -  V 


'  I 


t,j=l 


a2f(*.  o 

dx.  Bx  ^  i 
i-  j 


Xi )  (Xj  -  Xj) 
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n 

A  (x)  =  A  (x)  +  ^ 


(x. 


+  ^  L 

i.  j  =  l 


a2A(x) 
3x  .  8x 


Xi)(xj 


X  .  1 

J 


It  may  be  noted  that  the  derivatives  of  A(x)  can  easily  be 
evaluated  in  terms  of  those  of  f(x,  t)  via  Eq .  (12),  in  the  white 

noise  case,  or  more  generally  via  Eq .  (10). 

For  notational  convenience,  let 


(15) 


8x . 

l 


d.(x,  t) 


a2f(x,  t) 

Bx .  dx  . 
i  J 


°i j (x,  t) 


3A(x)  _  s 

toT  '  Vx) 

1 


Then, 


=  b..M 

dx,  Bxj  ij 


n 

V 


n 

n 


A (x)  =  A (x)  +  ^  a.(x.  -  xi)  +  ^  ^  bi j  (xi 

i=l  i,j=l 

n  n 

I  vi<xi  ■ ’'i)  +  ^  L  zij(xi 


Xi  ) (x  -  x  ) 


X.)(x.  -  X.) 


i=l 


i,j  =  l 


(16) 

(17) 

(18) 

(19) 

(20) 


u  (x)  =  u (x)  + 


(21) 
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N 


whe  rc 


u(x)  = 


Y  t  )  S  ( t  ) 

L,  Li.U  ,1  'J 


N 

v.  =  )  Tj  d.  (x,  I:  )  S(t  ) 

i  L.  u t  i  u  u 

p.,u=l 

N 

)  T|  c..(x,  t  )  S(t  ) 

L  ij  ii  u 


ij 


p,,  u-l 


or  the  limiting  values  of  these  as  it  j  becomes  dense  in  (T  ,  T  ) 

p  l  2 

For  the  white  noise  case,  the  limiting  values  become 

T 


u(x)  =  l 


f  (x,  t)  S(t)  dt 


o  T 


rL  =  di(x,  t)  S(t) 


dt 


(22) 


(23) 


(24) 


(25) 


(26) 


T2 

ZU  "  T  j  eij(x>  s(t)  dt 

o  _ 

T1 


(27) 


Now  make  the  further  assumption  that 


-n , 


72  -h 

P(x)  =  (2n)  '  T  * 


ll 

exp{-*  I  Vxi  '  xiHxj  ■  V} 


i>  j  =  l 


(28) 


where 


T  =  det  y 


-1 


In  other  words,  x^  are  now  taken  to  be  the  a-priori  means  of  x^, 
and  the  a-priori  joint:  distribution  of  the  parameters  is  taken  to  be 


Gaussian . 
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Wc  then  have 


n 

r~  ■* 


P(x)  exp 


!xp{-*  l 


0  - 


i,  j  =  l 


!ij(xi  ‘  xi)(xj  ’  V  +  L  ci(xi  "  xi}| dx 

i  =  1 


(29) 


.j 

X 


where 


exp 


x.)(xj 


i,  j=l 


xj) +  I  hCxi 

i  =  l 


x.)  j-  dx 


l.  .  =  V.  •  +  ^5  b.  .  -  k  z.  • 


£  .  =  -k  a .  +  v  . 

l  l  l 


(30) 

(31) 


In  Eqs.  (30)  and  (31),  and  z_  are  linear  functions  of  the 
observed  data  S(t). 

Now, 

-*Z-ij<xi  -  xi)(xj  -  xj>  +Zci(xi  -  V 
=  Z  5ij<xi  -  xi)(xj  -  +^Ihj  xi  xj 


where 


\  =  I  ^‘^ij  +  Xi 

j  =  l 


(32) 


(33) 


Thus 


ii 

J  POO  exp  |  -%  ^  5ij(xi  -  x.)(Xj  -  Xj)i 

\r  i  1  _  1  *' 


dx 


P  = 


i,j  =  l 


[-%  l  hj<xi  -  xi)(xj  -  v} 


exp 


dx 


(34) 
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Now  let  us  further  suppose  that  (§  )  is  a  positive  matrix 

Then  we  can  write 


0  =  Er$(x);  E ;  x] 


where  E  ^(x);  x]  is  the  expected  value  of  0(x)  with  respect  to  a 
joint  Gaussian  distribution  of  (x^,  •••,  x^)  having  means  x^  and 

inverse  covariance  matrix 


(35) 
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III.  VARIOUS  SPECIAL  CASES 


First,  suppose  that  p(x)  is  a  function  of  only  one  of  the  com¬ 
ponents,  say 


3  (x)  =  g(x  .  N 

Then,  clearly, 


(36) 


A  A 


0=8  = 


1 


2n[(iT1)..T* 

1  1 


CD 

g(x  .  ) 


(X.  -  X.)' 


i-  exp  H  ~~i 


dx. 


«  hi 


(37) 


In  particular,  setting  0(x)  =  x^ ,  then 


x  .  =  x  . 

l  l 


(38) 


If  we  set  0 (x)  =  x.,  then 

l 


/\ 

2  ~  2  -1 
*i  ‘  <*i>  +  <•  >u 


(39) 


Similarly,  the  MMSE  estimate  of  any  power  of  x^  can  be  ex¬ 
pressed  as  a  function  of  x.  and  (E  S..- 

in 

Next,  suppose  that  we  set 


0  ( x  )  =  x  .  x  . 

i  .1 


(40) 


for  any  particular  i  and  j. 


/  \  ~  ~  _  i 

X.  x.=x.  x.  +  (E  ).. 
1  J  1  J  1J 


(41) 


Then 
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By  well-known  formulas,  the  MMSE  estimate  of  any  function  of 
the  form 


n 

0(x)  =  TT  (x.  ) 
i  =  l 


k. 

1 


can  be  determined  as  a  function  of  x^,  i  =  1, 
i , j  =  1,  n. 


n,  and 


U' 


In  the  foregoing  formulas, 


x . 

t 


is  given  by  Eq . 


(33). 


(42) 
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lV.  ADDITIONAL  REMARKS  ON  THE  COMPUTATIONAL  IMPLICATIONS 


It  has  previously  been  mentioned  that  the  procedure  which  has 
been  described  is  not  necessarily  limited  in  applicability  to  cases 
where  the  statistical  conditions  are  satisfied  such  that  the  resulting 

estimates  are  really  MMSE,  any  more  than  least  squares  procedures  are 

a 

limited  to  cases  where  they  are  really  maximum  likelihood. 

Within  reasonable  limits,  one  may  regard  the  procedure  defined 

by  the  foregoing  equations  as  generally  applicable,  even  if  the  noise 

(£(t)j  and  the  a-priori  parameter  distribution  are  not  really  Guassian. 

Moreover,  one  can  use  the  white  noise  formulas  for  v.,  z..,  etc.,  even 

l  lj 

if  the  errors  (£(t)}  are  really  correlated.  Within  reasonable  limits, 
one  can  regard  the  matrices  T|^  and  y  as  arbitrary  choices  of  the 
designer  of  the  computation  procedure.  Naturally,  the  statistical 
quality  of  the  resulting  estimates  will  be  best  if  these  represent 
the  "true"  inverse  covariance  matrices  of  the  noise  errors  and  the 
a-priori  estimates,  respectively;  but  it  is  in  many  cases  desirable 
to  trade  statistical  quality  for  computational  convenience. 

The  procedure  stated  above  can  also  be  used  as  a  step  in  an 
iterative  procedure,  in  the  following  way.  We  can  regard  the  quanti¬ 
ties  x^  as  the  initial  estimates  in  the  iterative  procedure. 

A 

Then,  we  can  obtain  x^  by  means  of  Eqs.  (38)  and  (33).  Then, 

A  - 

x,  can  be  used  in  place  of  x^  in  precisely  the  same  procedure--i .  e .  , 

A  — 

with  v.,  z..,  a.,  b..  evaluated  at  x,  instead  of  x..  This  would 
i  ij  i  ij  i  i 

produce  another  estimate  (with  the  same  observational  data)  which 

/ 

A 

might  be  denoted  (x^)  .  The  procedure  would  then  continue  until  the 


estimate  settled. 
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